# Multilingual Speech Processing
Phi 4 Multimodal Instruct
MIT
Phi-4-multimodal-instruct is a lightweight open-source multimodal foundation model that supports text, image, and audio inputs to generate text outputs, with a context length of 128K tokens.
Multimodal Fusion
Transformers Supports Multiple Languages

P
mjtechguy
18
0
English Filipino Wav2vec2 L Xls R Test 07
Apache-2.0
This model is a fine-tuned version of jonatasgrosman/wav2vec2-large-xlsr-53-english on Filipino speech datasets, primarily used for English-to-Filipino speech recognition tasks.
Speech Recognition
Transformers

E
Khalsuu
24
0
Wav2vec2 Xlsr Nepali
Apache-2.0
This model is a fine-tuned Nepali speech recognition model based on facebook/wav2vec2-large-xlsr-53.
Speech Recognition
W
shishirAI
22
2
S2t Wav2vec2 Large En Tr
MIT
A Transformer-based end-to-end speech translation model for English-to-Turkish speech-to-text tasks
Speech Recognition
Transformers Supports Multiple Languages

S
facebook
55
3
S2t Small Covost2 En Et St
MIT
This is a Transformer-based end-to-end speech translation model specifically designed for converting English speech into Estonian text.
Speech Recognition
Transformers Supports Multiple Languages

S
facebook
15
0
S2t Small Covost2 En Ca St
MIT
This is a Transformer-based end-to-end speech translation model specifically designed to translate English speech into Catalan text.
Speech Recognition
Transformers Supports Multiple Languages

S
facebook
15
0
Featured Recommended AI Models